flowchart LR
A[("Raw reads")] --> B{"fastp"}
B --> C{"FastQC"} & E{"STAR"}
C --> D{"MultiQC"}
E --> F{"HTSeq"}
F --> G[("Count table")]
G --> I{"DESeq2"} & Z{"limma"} & W{"PCA"}
Z --> H{"PCA"} & J{"BioNERO"}
I --> K{"GSEA"}
J --> L{"ORA"} & N{"STRING"}
G --> V{"GSVA"}
A:::Aqua
B:::Aqua
B:::Pine
C:::Pine
E:::Pine
D:::Pine
F:::Pine
G:::Aqua
I:::Pine
Z:::Pine
W:::Pine
H:::Pine
J:::Pine
K:::Pine
L:::Pine
N:::Pine
V:::Pine
classDef Aqua stroke-width:1px, stroke-dasharray:none, stroke:#46EDC8, fill:#DEFFF8, color:#378E7A
classDef Pine stroke-width:1px, stroke-dasharray:none, stroke:#254336, fill:#27654A, color:#FFFFFF
Bulk RNA-seq analysis of mice model melanoma
Material and methods
Quality control of experimental mice transcriptomics data was performed using FastQC (v0.12.1). MultiQC tool was used for aggregation of FastQC reports [Ewels et al., 2016]. The fastp (v0.23.4) was used to filter out low-quality reads [Chen et al., 2018]. Read alignment was performed by STAR (v2.7.11) [Dobin et al., 2013] using the Mus musculus reference genome downloaded from GENCODE GRCm39.vM36 and the gtf file gencode.vM36.chr_patch_hapl_scaff.annotation.gtf. Count of gene expression profiles was performed using HTSeq count (v2.0.5) [Anders et al., 2015]. Differential expression analysis between experimental groups was performed using DESeq2 (v1.44.0) [Love et al., 2014]. Gene-set enrichment analysis (GSEA) was performed using fgsea (v1.31.6) [Korotkevich et al., 2016] and clusterProfiler (v4.12.6) [Wu et al., 2021] approaches using the following databases: MSigDB [Castanza et al., 2023]. Mapping gene identifiers between different coding systems was performed by biomaRt package. To perform the GSEA analysis using immune cells markers, a CellMarker 2.0 database were used [Hu et al., 2023]. Gene set variation analysis GSVA was used for estimating immune cell type enrichment within samples [Hänzelmann et al., 2013]. Co-expression analysis was performed using BioNERO (v1.12.0) [Almeida-Silva et al., 2022] with signed hybrid network type and Pearson correlation. Correction of expression data by differentiation method batch was performed using removeBatchEffect function from limma package [Ritchie et al., 2015]. Over-representation analysis for functional analysis of co-expressed gene modules linked to experimental groups using clusterProfiler package with MSigDB, KEGG [Kanehisa et al., 2025], Reactome [Milacic et al., 2024] and WikiPathways [Agrawal et al., 2024]. STRING database and STRINGdb Bioconductor package were used for construction of interaction networks linked to genes included to co-expressed gene modules [Szklarczyk et al., 2023]. For visualization of obtained results pheatmap (v1.0.12), ggplot2 (v3.5.1) and EnhancedVolcano packages were used. Data analysis were implemented in R (v4.4.2).
References
Database links
GENCODE: https://www.gencodegenes.org/mouse/
MSigDB: https://www.gsea-msigdb.org/gsea/msigdb
KEGG: https://www.genome.jp/kegg/
Reactome: https://reactome.org/
Wiki pathways: https://www.wikipathways.org/
CellMarker: http://www.bio-bigdata.center/
STRING: https://string-db.org/
Tool links
FastQC: https://github.com/s-andrews/FastQC
MultiQC: https://github.com/MultiQC/MultiQC
fastp: https://github.com/OpenGene/fastp
STAR: https://github.com/alexdobin/STAR
HTSeq: https://github.com/simon-anders/htseq
DESeq2: https://github.com/thelovelab/DESeq2
fgsea: https://github.com/alserglab/fgsea
clusterProfiler: https://guangchuangyu.github.io/software/clusterProfiler/
GSVA: https://bioconductor.org/packages/release/bioc/html/GSVA.html
biomaRt: https://github.com/grimbough/biomaRt
BioNERO: https://github.com/almeidasilvaf/BioNERO
limma: https://bioconductor.org/packages/release/bioc/html/limma.html
STRINGdb: https://www.bioconductor.org/packages/release/bioc/html/STRINGdb.html
pheatmap: https://github.com/raivokolde/pheatmap
ggplot2: https://github.com/tidyverse/ggplot2
EnhancedVolcano: https://github.com/kevinblighe/EnhancedVolcano
Results
Samples and experimental groups
A total of 14 transcriptomic samples corresponding to the following three experimental groups: melanoma (n=6) and melanoma with Bifidobacterium adolescentis 150 supplement (n=6) and melanoma with Lacticaseibacillus rhamnosus K32 supplement (n=2) were added to the analysis (see Table 1). The experiment was performed in two replicates: first included only melanoma and melanoma with B. adolescentis supplement groups; second included all three experimental groups. To simplify the description of the results, we adopted the following codings:
Experimental groups:
Melanoma –>
MMelanoma with B. adolescentis supplement –>
M_BIFMelanoma with L. rhamnosus supplement –>
M_LAC
Experimental batch:
First experiment –>
batch_1Second experiment –>
batch_2
Quality control
After quality filtering using fastp, MultiQC was used for aggregate FastQC reports. Figure 1 shows the summary MultiQC quality assessment profile which shows that the sequencing data have a average quality. In summary, after quality filtering, ~ 198 million reads (14 \(\pm\) 5 per sample) added to further analysis. In average 16 \(\pm\) 7% per sample were filtered. Quality filtering statistic showed in Table 2. HTSeq tool were used for gene expression statistic counting. Figure 3 showed HTSeq counts across experimental samples. No statistically significant differences in the number of reads were found between the experimental groups. However, different batches statistically significantly differed in the read counts (see Figure 3). The gene expression counts table normalized by median of ratios method is presented in Table 4.
MiltiQC report
Reads filtering by quality statistic
Mapping reads to reference counts
Saving 5 x 7 in image

ANOVA
Df Sum Sq Mean Sq F value Pr(>F)
batch 1 851.8 851.8 210.230 4.84e-08 ***
group 2 0.1 0.0 0.006 0.994
Residuals 10 40.5 4.1
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Normalized counts table
Principal component analysis
Principal component analysis (PCA) is a linear dimensional reduction technique with applications in exploratory data analysis and visualization. Before performing a PCA, expression data were transformed by variance-stabilizing transformation. This function calculates a variance stabilizing transformation (VST) from the fitted dispersion-mean relation(s) and then transforms the count data (normalized by division by the size factors or normalization factors), yielding a matrix of values which are now approximately homeostatic (having constant variance along the range of mean values). The transformation also normalizes with respect to library size. The bi-dimensional visualization using the first two components presented in Figure 4. Proportion of explained variance across principal components presented in Figure 5. According to analysis of variance using permutations (PERMANOVA) expression profiles statistical significant links to batch but not linked to experimental groups.
Saving 5 x 4 in image

Saving 5 x 4 in image

Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 9999
adonis2(formula = t(vsd) ~ batch + group, data = meta_data, permutations = 9999, method = "euclidean", by = "margin")
Df SumOfSqs R2 F Pr(>F)
batch 1 9421 0.19069 3.7386 0.0086 **
group 2 7784 0.15756 1.5445 0.1325
Residual 10 25199 0.51007
Total 13 49404 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Differential expression analysis
Differential expression analysis was performed using DESeq2 package with formula ~ batch + group for controlling dispersion linked to batch. Positive lfc and NES values in differential expression and GSEA analysis linked to B. adolescentis 150 and L. rhamnosus K32 supplement groups as well as negative values to melanoma group without any interventions. The DESeq2 package allows to estimate differential gene expression by constructing generalized linear models for the negative binomial distribution and analyzing linear contrasts between compared groups. According to obtained results 3 genes were up-regulated as well as 16 down-regulated in M_BIF group in comparison with M group. At the same time, 21 genes were linked to M_LAC groups while 0 genes linked to M group in comparison to M_LAC. When comparing bacteria supplement groups among themselves, 39 genes were linked to M_BIF whereas 116 genes were linked to M_LAC. Differentially represented genes were identified using the following significance thresholds adj. p < 0.05, |lfc| > 1. Differential analysis statistic presented in Tables 4-6. Visualization of differential expression genes of p-value - log FC measure presented via volcano plots (see Figures 6-8).
M_BIF vs M

M_LAC vs M

M_BIF vs M_LAC

Gene set enrichment analysis
Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a prior defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotype). Several databases were used to conduct the GSEA analysis, which allowed us to identify a wide range of biological functions associated with experimental groups. MSigDB hallmark gene-set were used for this analysis. The obtained results presented in Figure 9 and Tables 7-9. To identify specific expression markers characteristic of different cell types, a Cell Marker 2.0 database for mouse was used (see Figure 10 and Tables 10-12).
MSigDb Hallmark
M_BIF vs M
M_LAC vs M
M_BIF vs M_LAC

Cell Marker 2.0
M_BIF vs M
M_LAC vs M
M_BIF vs M_LAC

Prediction of immune cell counts
Mmcp counter (T cell)

Df Sum Sq Mean Sq F value Pr(>F)
group 2 149.33 74.67 30.90 5.24e-05 ***
batch 1 54.00 54.00 22.34 0.000808 ***
Residuals 10 24.17 2.42
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
GSVA
Macrophage

Df Sum Sq Mean Sq F value Pr(>F)
group 2 63.67 31.83 6.221 0.017563 *
batch 1 112.67 112.67 22.020 0.000851 ***
Residuals 10 51.17 5.12
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Df Sum Sq Mean Sq F value Pr(>F)
group 2 45.67 22.83 4.463 0.041192 *
batch 1 130.67 130.67 25.537 0.000497 ***
Residuals 10 51.17 5.12
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Df Sum Sq Mean Sq F value Pr(>F)
group 2 63.67 31.83 6.221 0.017563 *
batch 1 112.67 112.67 22.020 0.000851 ***
Residuals 10 51.17 5.12
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
CD8+

Df Sum Sq Mean Sq F value Pr(>F)
group 2 49.33 24.67 2.864 0.10390
batch 1 92.04 92.04 10.687 0.00844 **
Residuals 10 86.12 8.61
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1